-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't reshuffle eval data each "epoch" #229
Conversation
olmo/util.py
Outdated
if isinstance(dataloader.sampler, DistributedSampler): | ||
if update_epoch_seed and isinstance(dataloader.sampler, DistributedSampler): | ||
epoch = dataloader.sampler.epoch + 1 | ||
dataloader.sampler.set_epoch(epoch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how skipping this makes sure it starts from scratch every epoch. I thought we'd have to reset the data loader somehow every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each validation loop through a dataset is a complete loop / epoch over the dataset (less the extra examples, since drop_last=True
). The DistributedSampler
reshuffles each epoch with a specific seed that's shared across all processes. That seed never changes unless you call DistributedSampler.set_epoch()
. See https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the fact that this is confusing, err kind of works by exploiting a nuance of the DistributedSampler
, tells me that this probably isn't the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated: 9b52525
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need cycle_through_epoch
then? Or at least, do we still need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed: ac8322c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming then that we don't use cycle_through_epochs
for the training set?
Right |
Keep eval data order consistent across iterations. Add paths to metadata in
MemMapDataset
class.